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Abstract 

In this paper we introduce and study a new complexity measure for 
finite words. For positive integer d special scattered subwords, called 
super-d-subwords, in which the gaps are of length at least (d— 1), are de- 
fined. We give methods to compute super-d-complexity (the total number 



> 

. of different super-d-subwords) in the case of rainbow words (with pair 

' wise different letters) by recursive algorithms, by mahematical formulas 

. and by graph algorithms. In the case of general words, with letters from 

' a given alphabet without any restriction, the problem of the maximum 

. value of the super-d-complexity of all words of length n is presented. 

Subject Classifications: MSC2010: 68R15 CCS1998: G.2.1, F.2.2 

X ! 1 A new complexity measure: the sup er-d- complexity 

" " " Sequences of characters called words or strings are widely studied in combi- 

natorics, and used in various fields of sciences (e.g. chemistry, physics, social 
sciences, biology ^[HEIE] etc.). The elements of a word are called letters. A 
contiguous part of a word (obtained by erasing a prefix or/and a suffix) is a 
subword or factor. If we erase arbitrary letters from a word, what is obtained 
is a scattered subword. Special scattered subwords, in which the consecutive 
letters are at distance at most d {d > 1) in the original word, are called d- 
subwords [HIT]. In this paper we define another kind of scattered subwords, in 
which the original distance between two letters which are consecutive in the 
subword, is at least d {d > 1), these will be called super-d-subwords. 

One can easily observe that in any given word, the 1-subwords are exactly 
the (ordinary) subwords, and the super- 1-subwords are exactly the scattered 
subwords. 

The complexity of a word is defined as the total number of its different 
subwords. The definitions of d-complexity and super-d- complexity are similar. 

For a (finite) alphabet S, as usual, S" and S* are the sets of all words of 
length n, and of all finite words, respectively, over S. 

In order to formalize the above, we introduce the following two definitions. 
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Definition 1 Let n, d and s he positive integers, and u = xiX2 ■ ■ - Xn £ S". 
A super-d-subword of length s of u is defined where 
ii > I, 

d < ij+i - ij < n for j = 1,2, s - 1, 
is < n. 

Definition 2 The super-d- complexity of a word is the total number of its 
different super-d- suhwords. 

The super- 2-subwords of the word abcdef are the following: a, ac, ad, ae, 
af, ace, acf, adf, b, bd, be, bf, bdf, c, ce, cf, d, df, e, f therefore the super- 2- 
complexity of this word is 20. 

2 Computing the super-(i-complexity of rainbow words 

Words with pairwise different letters are called rainbow words. The super-d- 
complexity of a rainbow word of length n does not depends on what letters it 
contains, and is denoted by S{n,d). 

Let us denote by 6n,(i(0 the number of super-d-subwords which begin at 
the i-th position in a rainbow word of length n. Using our previous example 
(abcdef), we can see that 66,2(1) = 8, ^6,2(2) = 5, ^6,2(3) = 3, 66,2(4) = 2, 
66,2(5) = 1, and 66,2(6) = 1. 

We immediately get the following formula: 

bn,dii) = 1 + bn,d{i+d) + bn,d{i+d+l) +■■■+ hn,d{n), (1) 
for n > d, 1 < z < n — d, and 

6n,d(l) 

The super-d-complexity of rainbow 
formula: 

S{n, d) 

This can be expressed also as 

S{n, d) 

because of the formula 

S{n + l,d) = S{n,d) + hn+i,d{l)- 



= 1 for n < d. 

words can be computed by the following 

n 

= T.bn,d(i). (2) 

i=l 



E^m(1)> (3) 

k=l 
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Table 1: Values oi S{n,d). 



In the case d = 1 the complexity S{n, 1) can be computed easily: S{n, 1) = 
2" — 1. This is equal to the n-complexity of rainbow words of length n. 

In the sequel we will present different methods to compute the super-d- 
complexity of the rainbow words. In the description of algorithms the pseu- 
docode conventions from [2j are used. 



2.1 Computing by recursive algorithm 

From ([1]) for the computation of the following algorithm is obtained. 

The numbers bn,d{k) {k = 1,2, . . .) for a given n and d are obtained in the 
array b = (6i, 62, • • which is a global parameter in the following algorithms. 
Initially all these elements are equal to —1. The call for the given n and d and 
the desired i is: 



Input n, d, i 
for A; ^ 1 to n 
do bk < 1 

B(n, d, i) 

Output 61, 62, • • • , 



The recursive algorithm is the following: 
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B(n, d, i) 

1 p^l 

2 for k i + d to n 

3 do if 6fc = — 1 

4 then B(n, d, /c) 

5 p^p + bk 

6 bi p 

7 return 

If the call is .6(8, 2, 1), the elements will be obtained in the following order: 
67 = 1, bs = 1, 65 = 3, be = 2, 63 = 8, 64 = 5, and 61 = 21. 



Lemma 3 6n,2(l) = where Fn is the n-th Fibonacci number. 

Proof. Let us consider a rainbow word 0102 ■ ■ - an and let us count all of its 
super- 2-subwords which begin with 02- If we change a2 for ai in each super- 2- 
subword which begin with 02, we again obtain super- 2-subwords. If we prefix 
an ai to each super-d-subword which begin with 03, we again obtain super-d- 
subwords. Thus 

bn,2{l) = fen-l,2(l) +fen-2,2(l). 

So bn,2{^) is a Fibonacci number, and because 61,2(1) = 1, we obtain ^'n,2(l) = 
Fn- ^ ' ' □ 

Theorem 4 S{n,2) = Fn+2 — 1; where Fn is the n-th Fibonacci number. 
Proof. From ([3]) and Lemma [3j 

S{n,2) = 61,2(1) +62,2(1) + &3,2(1) +64,2(1) + ---+fen,2(l) 
= Fi+F2 + ---+Fn 

= Fn+2 -I- □ 

Introducing the notation M„,rf = bn,di^), then by the formula 

Vrf(l) = ^n-l,d(l) + 6n-d,d(l), 

a generalized middle sequence (see the sequence AOOO93C0 in [8j) wih be ob- 
tained in the following, recursive way: 

Mn,d = Mn-l,d + Mn-d,d, foT U > d > 2, (4) 

Mo,rf = 0, Ml = 1, ...,Mrf_w = l. 



^From [8j: ao = ai = 02 = 1; thereafter a„ = a„_i + a„_3. Might be called the Middle 
Sequence, since it is a cross between the Fibonacci sequence (A000045) and the Padovan 
sequence (A000931). 
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Let us call this sequence d-middle sequence. Because of the equality M„^2 = Fn, 
the d-middle sequence can be considered as a generalization of the Fibonacci 
sequence. 

The d-middle sequence defined in (HD is a little different from the general- 
ization of the sequence A000930 in [8] because of its initial values. 

The next algorithm computes M„^rf, by using an array Mq, Mi, . . . , Mrf_i 
to store the necessary previous elements: 

MlDDLE(n, d) 

1 Mo ^ 

2 for i ^ 1 to d - 1 

3 do Mi ^ 1 

4 for i d to n 

5 do Mi mod d ^ M(j„i) mod d + mod d 

6 print Mj mod d 

7 return 

Using the generating function Md{z) = ^^M^^^z", the following closed 

n>0 

formula is obtained: 

n 

This can be used to compute the sum Sn,d = ^i,di which is the coefficient 

n=l 

of 2;"+'^ in the expansion of the function 

z'^ 1 z'^ z z 

1 — z — z'^ 1 — z 1 — z — z'^ 1 — z — z'^ 1 — z 

So Sn.d = Mn+(^d-i),d + Mn,d - 1 = Mn+d,d " 1- Therefore 

n 

^M,^d = Mn+d,d-l- (6) 

i=l 

Theorem 5 S{n,d) = Mn+d,d — 1; where n > d and Mn,d is the n-th element 
of the d-middle sequence. 

Proof. The proof is similar to that in Theorem H] taking into account formula 
(161). □ 



2.2 Computing by mathematical formulas 

Theorem 6 5(n, d) = ( , j , for n > 2,d > 1. 

k>o V + / 
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Proof. Let us consider the generating function G(z) = = \+z+z^ +■ ■ ■ . 

1 — z 

Then, taking into account the formula 1^ we obtain Md{z) = zG{z + z'^) = 
z + z{z + z'^) + z{z + z'^)'^ + • • • + z{z + z'^y + • • • . The general term in this 
expansion is equal to 

^i+iy-fA^(j-i)i, 



E 

k=l 

and the coefficient of z""*"^ is equal to 



E 



fc>0 

The coeeficient of 2;"+'^ is 



n-{d-l)k 
k 



k>0 ^ 

By Theorem [5] 5(n, d) = Mn+d,d — 1, and an easy computation yields 



fc>o ^ ^ □ 



fc>0 ^ ^ 
Theorem 7 = J] T ~ ^1" ^} ,forn>l,d>l. 



fc>0 

Proof. From 6„+i,d(l) = M„+i,d and 



l^n-{d-l)k\ 



fc>0 ^ ^ □ 



2.3 Computing by graph algorithms 

To compute the super-d-complexity of a rainbow word of length n, let us 
consider the word 0102 • • • a„ and the correspondig digraph G = (V, E), with 
V = {ai,a2, . . . ,a„}, 

E = {{ai,aj) \ j -i>d,i = 1,2,..., n,j = 1,2,..., n}. 
For n = 6, d = 2 see Figure [H 

The adjacency matrix A = (aij) i^— of the graph is defined by: 

\ 1, ii j — i > d, „ . ^ „ . ^ „ 

Oj,- = < „ . for z = 1, 2, . . . , n, 7 = 1, 2, . . . , n. 

^ 1 0, otherwise, ' ' ' '-^ 
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7 



Figure 1: Graph for 2-subwords when n = 6. 



Because the graph has no directed cycles, the entry in row i and column 
j in (where A'' = A''~^A, with A^ = A) will represent the number of k- 
length directed paths from Cj to aj. If / is the identity matrix (with elements 
equal to 1 only on the first diagonal, and otherwise), let us define the matrix 



R = I + A + A'^-\ h^*", where = O (the nuh matrix). 

The super-d-complexity of a rainbow word is then 



To compute matrix R, we define a variant of the well-known War shall algo- 
rithm (for this see for example ^): 

WARSHALL(yl, n) 

1 W^A 

2 for k ■(^ 1 to n 

3 do for i ^ 1 to n 

4 do for j ^ 1 to n 

5 do Wij ^ Wij + WikWkj 

6 return W 

From W we obtain easily R = I + W. 

For example let us consider the graph in Figure [TJ The corresponding adjacency 
matrix is: 



R = inj): 



n n 



S{n,d) = ^^rij. 
i=i j=i 
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After applying the Warshall algorithm we obtain: 



/ 1 1 2 3 \ 

1 1 2 

1 1 

1 



Voooooo/ 



/ 1 1 1 2 3 \ 

10 112 

10 11 

1 1 

1 

Voooooiy 



and then S{6, 2) = 20, the sum of entries in R. 

The Warshall algorithm combined with the Latin square method can be 
used to obtain all nontrivial (with length at least 2) super-d-subwords of a 
given rainbow word aia2 • • • a„. Let us consider a matrix A with entries Aij 
which are set of words. Initially this matrix is defined as: 

_ r {aiaj}, ifj-i>d, f ■ 1 2 n 9 - 1 2 n 
Aj - I 0^ otherwise, tor i - 1, 2, . . . ,n, j - 1, 2, . . . ,n. 

If A and B are sets of words, AB will be formed by the set of concatenation 
of each word from A with each word from B: 



AB = {ab\a€A,be B}. 



If s = SiS2 • • • Sp is a word, let us denote by 's the word obtained from s by 
erasing its first character: 's = S2S3 • • • Sp. Let us denote by 'Aij the set Aij 
in which we erase from each element the first character. In this case 'A is a 
matrix with entries 'Aij. 

Starting with the matrix A defined as before, the algorithm to obtain all 
nontrivial super-cZ-subwords is the following: 

Warshall-Latin(^, n) 

1 W^A 

2 for /c <— 1 to n 

3 do for i <— 1 to n 

4 do for j ^ 1 to n 

5 do if Wik + and W^j + 

6 then Wij ^ Wij U Wik 'Wkj 

7 return W 

The set of nontrivial super-d-subwords is Wij. 

»j6{l,2,...,n} 

For n = 8, d = 3 the initial matrix is: 
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/ 


{ad} {ae} {af} 


{ag} 


{ah} \ 





{be} {bf} 


m 


m 





{c/} 


{eg} 


{ch} 








{dg} 


{dh} 














{eh} 




V 












/ 



The result of the algorithm in this case is: 



/ 


{ad} {ae} {af} 


{ag, adg} 


{ah, adh, aeh} \ 





{be} {bf} 


m 


{bh, beh} 





{c/} 


{eg} 


{ch} 








{dg} 


{dh} 














{eh} 




V 












/ 



3 The general case 



In the general case for any word w & T,*, let us denote the super-d-complexity 
by Sw{d). We have 

^ <S^{d)<Si\w\,d), 

where \w\ is the length of w. The minimum value is obtained for a trivial word 
w = a . . . a, and the maximum one for a rainbow word. 

The algorithm Warshall-Latin can be used for nonrainbow words too, 
with the remark that repeating subwords must be eliminated. For the word 
aabbbaaa and d = 3 the result is: aa, ab, aba, ba. 

Let us denote by f{m,n,d) the maximum value of the super-d-complexity 
of all words of length n over an alphabet of m letters: 



f(m,n,d)= max ISyjid)). 

m= |S| 

For /(2, n, d) the following are true, and can be easily proved. 



f{2,n,n 
f{2,n,n 



3 for n > 3. 
5 for n > 4. 



• If 



< d < n — 3 then /(2, n,d) = 6 for ra > 6. 
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Table 2: Values of /(2, n, d). 



-- 10 for n > 6. 
7 for n > 5. 



Conclusions 

The super-d-complexity of the finite rainbow words can be obtained by recur- 
sive algorithms, by direct mathematical formulas, and by graph algorithms, all 
these being presented in this paper. The advantage of the graphs algorithm is 
that these can be easily altered for obtaining not only the complexity, but the 
all super-d-subwords too. This method can be adapted to obtain the super- 
d-subwords in the general case of the words too, when no restriction on the 
letters are given. 

In the set of all words of a given length over a given alphabet the maximum 
super-d-complexity may be interesting. We present here only some easy to 
prove results, an extensive study remaining for the future. 
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If n is even, then / ( 2, n, 
If n is odd, then / ( 2, n, ^ 
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